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ABSTRACT 

The nucleotide sequence of the RNA encoding the nucleocapsid protein of 
coronavirus MHV-A59 has been determined. Copy DNA was prepared from mRNA 
isolated from virally infected celis, fragmented and cloned in the phage 
vector M13 mp8 for direct sequence determination. A sequence of 1817 nu- 
cleotides, adjacent to the viral poly-A tail, was obtained. It contains 
a single long open reading frame encoding a protein of mol. wt. 49660, 
which is enriched in basic residues. 


INTRODUCTION 

The coronaviruses comprise a large group of enveloped RNA viruses isolated 
from a range of animal hosts (for review see 1). They have been associated 
with a variety of respiratory and gastro-intestinal infections and neurolo- 
gical disorders, and may also provide a model for the study of persistent 
viral infection. 

The coronavirus strain A59, a mouse hepatitis virus, can be propagated in 
cell culture, and its molecular biology has been studied in some detail. The 
virion contains a single positive-stranded RNA, 18kB in length, associated 
with a nucleocapsid protein (2). The viral lipid envelope contains two gly- 
coproteins: El, of mol. wt. 24,000, occurring in unglycosylated as well as 
well as O-glycosylated forms (3,4,5), and E2, (mol. wt. 90,000/180,000), 
which forms the surface projections or peplomers characteristic of the coro- 
na virion (2,4). 

Two features of the life cycle of coronaviruses are of particular interest at 
the molecular level. First, the viral mRNA's produced during infection form 
a nested set, corresponding to the 3' end of the virion RNA but extending 

to different lengths towards the 5' end: the largest is identical to the vi- 
rion RNA (6-12). From each of the RNA's (seven in the case of MHV-A59), only 
the 5' gene is translated (7,13,14). Thus, the coronaviruses have a replica- 


tion strategy which differs from any so far reported for RNA viruses. Secon- 
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dly, the virus buds intracellularly, in endoplasmic reticulum and perhaps 
Golgi membranes (15,16). The factors which specify this site of assembly, 
rather than the plasma membrane, are at present unknown. 

Here we report a nucleotide sequence of cloned copy DNA prepared from MHV-A59 
mRNA. A sequence of 1817 nucleotides, ending in the polyadenylation site, 

has been determined. Translation of the sequence predicts a polypeptide whose 
size, general features and genetic location are consistent with its being 


the viral nucleocapsid protein. 


MATERIALS AND METHODS 

cDNA preparation 

Total poly-At RNA from MHV-A59 infected Sac. cells was prepared as described 
(17). First-strand cDNA was synthesized in a mixture containing RNA (50yg), 
Tris-Cl pH 8.3 (50mM), KCl (50mM), MgCl2 (8mM), dithiothreitol (1lmM), oligo- 
dT > 18 (1 fig, P.L. Biochemicals), sodium pyrophosphate (2mM), dATP, dGIP, 
dTTP (each ImM), dCTP (0.5mM), a->"p-dCTP (5O0y Ci: Amersham) and AMV reverse 
transcriptase (350 units; kindly provided by Dr. J. Beard) in a total volume 
of 2001 . The mixture was incubated for 30 min. at 41°C, then for !5 min 

at 45°C. EDTA was added to 20mM, the material extracted twice with phenol/ 
chloroform and the pooled aqueous phases extracted twice with ether. The 
products were precipitated with ethanol and redissolved in 50u1 5mM Tris, 
ImM EDTA pH 7.5. The material was loaded on a 1% low-melting-temperature 
agarose (BRL) gel, and electrophoresis carried out at 20v/cm for 60 min, to 
remove low molecular weight material. Regions of the gel containing cDNA were 
identified by autoradiography, cut into Imm slices, and each slice melted, 
phenol-extracted and the cDNA precipitated with ethanol. RNA was hydrolysed 
by incubating the material from each slice in 0.2M KOH for 10 min at 65°C 

in a volume of 20 ul , and the mixture was neutralized with HCl. The cDNA 
was converted to the double-stranded form in a mixture containing HEPES/KOH, 
pH 6.9 (100mM), MgClo (4mM), dithiothreitol (0.5mM), dATP, dCTP, dGTP and 
dTTP (each ImM), KCl (50mM), Klenow polymerase (20 units; Boehringer) in a 
volume of 100u1 . The mixture was incubated for 3 hrs at 17°C, and the DNA 
precipitated with ethanol. Yields obtained from each gel fraction were mea~ 
sured by TCA precipitation of aliquots on filters, and scintillation counting. 
Fragmentation and cloning of cDNA 

Portions (30ng) of the double-stranded cDNA were cleaved with one of the res- 
triction enzymes HaeIII (kindly supplied by V. Pirrotta), Fnu DII or RsaI 
(both New England Biolabs) under standard conditions. The DNA was purified 
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by phenol extraction and ethanol precipitation, and randomly ligated to 10Ong 
M13 mp8 (18) replicative-form DNA which had previously been linearized with 
SmaI (New England Biolabs), in a mixture containing Tris-HCl, pH 7.5 (50mM), 
MgCl (10mM), dithiothreitol (ImM), ATP (0.2mM) and T4 ligase (6 units; kind- 
ly supplied by R. Brown) in a volume of 101. The ligations were incubated 
overnight at room temperature. 

Alternatively, cDNA was treated with nuclease SI. cDNA (80ng) was incubated 

in 60u1 of a solution containing sodium acetate (30mM , pH 5.2), NaCl (0.3M), 
Zn Clo (2mM) and 10 units SI nuclease (BRL), for 3 min at 37°C and 10 min at 
room temperature. EDTA was added to 5mM, the mixture extracted twice with 
phenol/chloroform, washed with chloroform and ether, and the DNA precipitated 
with ethanol; approximately 50% of the input radiolabel remained TCA-precipi- 
table. The DNA was then treated with Klenow polymerase in a ]0y]l volume con- 
taining Tris-Cl (10mM, pH 7.5), MgClz (10mM), NaCl (50mM), dithiothreitol 
(O.5mM), all four deoxynucleotides (each 0.4mM) and Klenow polymerase (1 unit) 
for 15 min, at room temperature. EDTA was added to 20mM and the mixture phenol- 
extracted and ethanol-precipitated as before. The DNA was then ligated to 
SmaI-digested M13 mp8 exactly as for restricted cDNA. 

The products of the ligation reactions were then used to transfect competent 
E.coli cells, lac plaques picked and grown, the viral DNA isolated and used 
directly for single-nucleotide dideoxy-sequencing of the inserted DNA (all as 
in 19), without any prior screening. Clones of interest were sequenced, using 
a 15-base synthetic single-strand oligonucleotide (P.L. Biochemicals) as 
universal primer. Reaction products were analysed on 0.2mm thick thermostatted 
6% acrylamide gels (20), 40cm or 60cm in length. Sequences were stored, and 


overlaps identified, by computer, using the program package of Staden (21). 


RESULTS AND DISCUSSION 

"Shotgun" cloning of viral cDNA 

A total of 48 recombinant cloned were sequenced; of these sequences, 33 could 
be assembled into a contiguous consensus of 1817 nucleotides, followed by a 
poly-A tract of variable length (Fig.1). The remaining 15 clones, none of 
which contained sequence overlaps with each other, presumably arise from 
cellular mRNA present in the starting material, or from regions towards the 
5' end of the viral genome. 

Several of the clones generated by Sl-nuclease digestion of the cDNA diverged 
at one end from the main consensus (Fig.1). With the possible exception of 


clone S9 (see below), these were probably due to short 3' extensions remai- 
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ning on some cDNA fragments after treatment with $1 nuclease and Klenow poly- 
merase; these could promote simultaneous ligation of more than one fragment 

to a single vector molecule. Therefore sequences obtained from S|l-derived 
clones were used only to confirm regions of the consensus obtained from res- 
tricted-cDNA clones. 

Structure of the MHV-A59 RNA's 

A sequence of 1817 nucleotides, adjoining a poly-A tract, is shown in Fig.2. 
The viral origin and strand orientation of the sequence was confirmed by 

dot hybridization of radiolabelled, fragmented virion DNA to the single-strand 
recombinant DNA's (not shown). 

Comparison of the sequence with the analysis of RNAse-T! oligonucleotides from 
various viral RNA's reported by Lai et al. (22) showed good, although not per- 
fect, agreement (Table 1) between the two sets of data. Of the nine dligmucleo- 
tides isolated from RNA7, seven have similar corresponding sequences in Fig.1. 
The two which were not identified, spots 10 and 19, probably correspond to 

two spots which are also not found in 3' fragments of the virion RNA (6). 

It has been proposed that these oligonucleotides reflect the existence of a 
short 5' sequence common to all seven viral RNA's, and that oligonucleotides 
19 and 19a cross the joining site between the leader sequence and the coding 
region in RNA's 7 and 6 respectively (8,6). Nucleotides 137-152 from Fig.! 
show some similarity to both of these oligonucleotides, and also to spot 17, 


which is found in all the larger RNA's, but not RNA7 (see Table 1). If the 


i] 200 400 600 800 1000 1200 1400 1600 1800 
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R 
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S1 Nuclease = .~—7—'83, a a a ae es, 


Figure |}. 

Arrangement of clones used to construct the nucleocapsid sequence. Arrows show 
the direction in which the sequence was determined (the complement of the MI3 
single-strand DNA). The 5' end of clone F9 is an anomalous cleavage; all other 
restricted-cDNA clones are bounded by correct restriction sites. The sequence 
of clone F15 was readable beyond 600 nucleotides with sufficient accuracy to 
confirm its overlap with clone H9. Diagonal lines represent regions of diver- 
gence from the consensus sequence. 
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CGCTTATAAGTGCAAAGGTGTGACACCTI AGCAACTGTAACATAAGCGGGCAAGTICAGAC 
10 20 30 40 50 60 


AAAGAGAAACCGGTGCCTAGCTTAACAGCCTTGCAT GT AGAGGTGGCCACGAATAATAGT 
70 80 90 100 -110 120 


4H S F WV P G Q@ E N 
GGCTGTTAGTGTATGGTAATCTAAACTTTAAGGATGTICTTTTGTICCTGGGCAAGAAAAT 
130 140 150 160 170 180 


A GGR s&s § § VN R A GCG N GCG TL K K T T 
GCCGGTGGCAGAAGCTCCTCTGTAAACCGCGCT GGT AATGGAATCCTCAAGAAGACCACT 
190 200 210 220 230 240 


H A D Q F E R GP NH N Q N R G KR R N Q P 
TGGGCTGACC AAACCGAGCGTGGACCAAATAATCAAAATAGAGGCAGAAGGAATCAGCCA 
250 260 270 280 290 300 


kK 9 T A T T OQ P NHN S GS VeV P H Y S H F 
AAGCAGAC TGCAAC TACTCAACCCAAC TCCGGGAGTGTGGTTCCCCATTACTCCIGGTTT 
310 320 330 340 350 360 


Ss G Tt T 9 F Q K GK E F OQ F A EE GQ EG Y¥ 
TC TGGCATTACCCAGTTCCAAAAGGGAAAGGAGTTTCAGTTTGCAGAAGGACAAGGAGTG 
370 380 390 400 410 420 


P T A WN G IT P A S € OQ K GY H Y R H NA 
CCTATTGCCAATGGAATCCCCGCT TCAGAGCAAAAGGGATATTGGTATAGACACAACGCC 
430 440 450 460 470 480 


VvoetutcK H FC © 6@ S§S R S$ N YY C P O G TIT F T 
GTTCTTTTAAAACACCTGATGGGCAGCAGAAGCAATTACTGCCCAGATGGTATTTTTACT 
490 500 510 520 530 540 


1 tc A O9 GCG P MBM LL FE P V Hw E FT A L K E S S$ 
ATCTTGGCAC AGGGCCCCATGCTGGAGCCAGT TATGGAGACAGCATIGAAGGAGTCTITCT 
550 560 570 580 590 600 


G t © T A K R FT P T P A L TT bt S$ K GCG T @ 
GGGTTGCAAACAGCCAAGCGGACACCAAT ACCCGCTCTGATATTGTCGAAAGGGACCCAA 
610 620 630 640 650 660 


A V A R LE F LL GCG ELE R P A R YY C KL R A F 
GCAGTCATGAGGCTATTCCTACTAGGTTTGCGCCCGGCACGGTATTGCCTCAGGGCTTITT 
6790 680 690 700 710 720 


Mo LK AL E€ GLE H LL LC A OD L VY R GCG H NH P 
ATGTTGAAGGCTCTGGAAGGTCIGCACCTGCTAGCCGATCTGGTTCGCGGTCACAATCCC 
730 740 750 760 770 780 


Y GO 1TH R AR S$ S$ S N OR OP AS T VY 
GTGGGCCAAATAATGCGCGCTAGAAGCAGTTCCAACCAGCGCCAGCCTGCCTCTACTGTA 
790 800 810 820 830 840 


K P D M A E E T A ALC VY L A K L GCG K DA 
AAACCTGATATGGCCGAAGAAAT TGCTGCTCTTGTTTIGGCTAAGCTCGGTAAAGATGCC 
850 860 870 880 890 900 
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G Q@ P K O ¥Y T K @ §S A K K VR Q K YT LN 
GGCCAGCCCAAGCAAGTAACGAAGCAAAGTGCCAAAAAAGTCAGGCAGAAAATTTTAAAC 
910 920 930 940 950 960 


kK P R Q@ K R T PN K OQ C P WV Q@ @ C F G K 
AAGCCTCGCC AAAAGAGGACTCCAAAC AAGCAGTGCCCAGTGCAGCAGTGTTTTGGAAAG 
970 980 990 1000 1010 1020 


R GP N O N F GG S$ E A LE K L GG T S D P 
AGAGGCCCCAATCAGAATTTTGGAGGCTCTGAAATGTTAAAACTTGGAACTAGTGATCCA 
1030 1040 1050 1060 1070 1080 


QoQ F P T tC A E bt A P T WV G A F F F G S K 
CAGTTCCCCATTCTTGCAGAGTTGGCTCCAACAGTTGGTGCCTICTICTTTGGATCTAAA 
1090 1100 1110 1120 1130 1140 


tc € Lt V K K N §$ G GCG A O0O E P TT K DO V Y E: 
TT AGAATTGGTCAAAAAGAATTCTGGTGGTGCTGATGAACCCACCAAAGATGTGTATGAG 
1150 1160 1170 1180 1190 1200 


t Q@ YY S$ GA VR F O S T tL P GCG F E FT TI & 
CTGCAATATTCAGGTGCAGTTAGATTTGATAGTACTCTACCTGGTTTTGAGACTATCATG 
1210 1220 1230 1240 1250 1260 


kK VY t N E N LN A Y Q K O GCG G AD VW VS 
AAAGTGTTGAATGAGAATTTGAATGCC TACCAGAAGGATGGTGGTIGCAGATGTGGTGAGC 
1270 1280 1290 1300 1310 1320 


P K P Q R K GR R Q A Q € K€ K DOD E V DO N 
CC AAAGCCCC AAAGAAAAGGGCGTAGACAGGCTCAGGAAAAGAAAGATGAAGTAGATAAT 
1330 1340 1350 1360 1370 1380 


ves VW AK P K § S V@ R M VY S R EL T P 
GTAAGCGTTGCAAAGCCCAAAAGCTCTGTGCAGCGAAATGTAAGTAGAGAATTAACCCCA 
1390 1400 1410 1420 1430 1440 


E odo R S$ tL tL A Qt eL.oOd DG VY VvVP DO Ge E 
GAGGATAGAAGTCTGTTGGCTCAGATCCT AGATGATGGCGTAGTGCCAGATGGGTTAGAA 
1450 1460 1470 1480 1490 1500 


0 DB S$ N VY * 
GATGACTCTAATGTGTAAAGAGAATGAATCCTATGTCGGCGCTCGGTGGTAACCCTCGCG 


1510 1520 1530 1540 1550 1560 
AG AAAGTCGGGATAGGACACTCUTCTATCAGAATGGATGTCTTGCTGTCATAACAGATAGA 

1570 1580 1590 1600 1610 1620 
GAAGGTTGTGGCTGCCCTGTATCAATT AGT TGAAAGAGAT TGCAAAATAGAGAATGIGTG 

1630 1640 1650 1660 1670 1680 
AG AGAAGTTAGCAAGGTCCTACGTCTAACCATAAGAACGGCGATAGGCGCCCCCTIGGGAA 

1690 1700 1710 1720 1730 1740 
GAGCTCACATCAGGGTACTATTCTTGCAATGCCCTAGTAAATGAATGAAGTTGATCATGG 

1750 1760 1770 1760 1790 1800 
CCAATTGGAAGAATCACAAAAAAAAAAAAAAAAAAAAAAA 

1810 1820 1830 1840 


Fig.2 Sequence of the HHY-A59 nucleocapsid gene ana protein. 
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components of all three oligonucleotides are ordered to maximise homology 

with the cDNA sequence (Fig.3), the predicted sequences are consistent with 
the earlier assignments of spots 19 and 19a by Lai et al. (22) and in addition 
suggest that spot 17 represents an internal region from the larger RNA's which 
is imediately upstream from the nucleocapsid gene. It is of interest that 
clone S9 diverges from the consensus sequence (Fig.1) at almost exactly the 
inferred site of divergence between oligonucleotides 19 and 17, suggesting 
that the divergence is due not to a cloning artefact, but to its originating 


from a different viral RNA. 


Sequence of the nucleocapsid gene 
The extreme 3' gene of MHV-A59, corresponding to RNA7, is known to encode the 


viral nucleocapsid protein (7,13). Translation of the sequence (Fig.2) gives 
only one long open reading frame, from nucleotides 154 to 1515, predicting a 
protein of mol. wt. 49,660 which is enriched in basic residues. These features 
are entirely consistent with the observed electrophoretic mobility of the 
nucleocapsid protein (2) and its function of binding to viral RNA. There is 
also a short open reading frame between nucleotides 218 and 488, predicting 

a polypeptide of 90 amino acids, but there is at present no evidence for the 
existence of such a protein. 

A search for homology with the nucleocapsid gene sequences of snowshoe hare 
Bunya virus (23), Sindbis virus (24), influenza virus (25) and vesicular sto- 
matitis virus (26) using the computer program SEQFIT (21) revealed no signifi- 
cant similarities confirming that the coronaviruses form a distinct viral 
group. 

This is the first primary structure, of either nucleic acid or protein, to be 
determined from a coronavirus. Experiments are now in progress, using restric- 
tion fragments prepared from the clones described above, to determine the 


precise sequence relationships between the virion and messenger RNA's, and 


Sequence No 140 150 Met 


Oligonucleotide GTAGGTAATCTAAACTTTAAGGATG 


19 [r.T.AAAT,CITAATCTAAACTTTAAG 
19a (..TAAAT,CITAATCTAAACTATATG 
17 fAacjcJTAATCTAAACTTITAAG 
Figure 3 


Alignment of oligonucleotides described by Lai et al. (22) with the sequence 
from Fig.2. Oligonucleotide 19 was found uniquely in RNA7, 19a uniquely in 
RNA6 and 17 in all the viral RNA's except RNA7 (6,22). 
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to done, specifically, the gene adjacent to the nucleocapsid gene, which en- 


codes the viral glycoprotein E] (13,14). 
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